Discrete Point Based Signatures and Applications to Document Matching

نویسندگان

  • Nemanja Spasojevic
  • Guillaume Poncin
  • Dan S. Bloomberg
چکیده

Document analysis often starts with robust signatures, for instance for document lookup from low-quality photographs, or similarity analysis between scanned books. Signatures based on OCR typically work well, but require good quality OCR, which is not always available and can be very costly. In this paper we describe a novel scheme for extracting discrete signatures from document images. It operates on points that describe the position of words, typically the centroid. Each point is extracted using one of several techniques and assigned a signature based on its relation to the nearest neighbors. We will discuss the benefits of this approach, and demonstrate its application to multiple problems including fast image similarity calculation and document lookup.

منابع مشابه

Signature-Based Document Image Retrieval

As the most pervasive method of individual identification and document authentication, signatures present convincing evidence and provide an important form of indexing for effective document image processing and retrieval in a broad range of applications. In this work, we developed a fully automatic signature-based document image retrieval system that handles: 1) Automatic detection and segment...

متن کامل

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

Signature-based Document Image Retrieval

As the most pervasive method of individual identification 4 4 and document authentication, signatures present convincing evidence 5 5 and provide an important form of indexing for effective document image 6 6 search and retrieval in a wide range of applications. In this work, we 7 7 developed a fully automatic signature-based document image retrieval 8 8 system that handles: 1) Automatic detect...

متن کامل

Elliptic Curve Digital Signatures and Accessories

Digital signatures have been used in Internet applications to provide data authentication and non-repudiation services. Digital signatures will keep on playing an important role in future Internet applications. There are two most well-known public-key cryptosystems, the RSA scheme and the ElGamal scheme, which can provide both digital signature and data encryption. More recently, the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

متن کامل
عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011